Name | Version | Summary | date |
---|---|---|---|
minference | 0.1.5.post1 | To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy. | 2024-08-13 09:39:09 |
hour | day | week | total |
---|---|---|---|
44 | 1215 | 7403 | 283314 |